Search CORE

4 research outputs found

Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond

Author: Deng Naihao
Huang Minlie
Jia Yilin
Liu Siyang
Mihalcea Rada
Sabour Sahand
Publication venue
Publication date: 13/11/2023
Field of study

We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.Comment: Accepted at the main conference of The 2023 Conference on Empirical Methods in Natural Language Processing; 8 page

arXiv.org e-Print Archive

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Author: Chen Yulong
Deng Naihao
He Yinghui
Jia Yilin
Mihalcea Rada
Wu Yufan
Publication venue
Publication date: 25/10/2023
Field of study

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.Comment: Accepted at Findings of EMNLP 202

arXiv.org e-Print Archive

You Are What You Annotate: Towards Better Models through Annotator Representations

Author: Deng Naihao
Liu Siyang
Mihalcea Rada
Wang Lu
Wu Winston
Zhang Xinliang Frederick
Publication venue
Publication date: 22/10/2023
Field of study

Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead try to directly model the diverse perspectives of the annotators, and explicitly account for annotators' idiosyncrasies in the modeling process by creating representations for each annotator (annotator embeddings) and also their annotations (annotation embeddings). In addition, we propose TID-8, The Inherent Disagreement - 8 dataset, a benchmark that consists of eight existing language understanding datasets that have inherent annotator disagreement. We test our approach on TID-8 and show that our approach helps models learn significantly better from disagreements on six different datasets in TID-8 while increasing model size by fewer than 1% parameters. By capturing the unique tendencies and subjectivity of individual annotators through embeddings, our representations prime AI models to be inclusive of diverse viewpoints.Comment: Accepted to Findings of EMNLP 202

arXiv.org e-Print Archive

The Cross-lingual Conversation Summarization Challenge

Author: Bai Xuefeng
Chen Yulong
Deng Naihao
Li Jing
Zhang Yue
Zhong Ming
Zhu Xianchao
Publication venue
Publication date: 03/05/2022
Field of study

We propose the shared task of cross-lingual conversation summarization, \emph{ConvSumX Challenge}, opening new avenues for researchers to investigate solutions that integrate conversation summarization and machine translation. This task can be particularly useful due to the emergence of online meetings and conferences. We construct a new benchmark, covering 2 real-world scenarios and 3 language directions, including a low-resource language. We hope that \emph{ConvSumX} can motivate researches to go beyond English and break the barrier for non-English speakers to benefit from recent advances of conversation summarization

arXiv.org e-Print Archive